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Abstract 


A wavelet basis selection procedure is presented for wavelet regres- 
sion. Both the basis and the threshold are selected using cross- 
validation. The method includes the capability of incorporating 
prior knowledge on the smoothness (or shape of the basis functions) 
into the basis selection procedure. The results of the method are 
demonstrated on sampled functions widely used in the wavelet re- 
gression literature. The results of the method are contrasted with 
other published methods. 


1 INTRODUCTION 

Wavelet regression is a technique which attempts to recover a sampled function 
corrupted with noise. This is done by thresholding the small wavelet decomposition 
coefficients which represent mostly noise. Most of the papers published on wavelet 
regression have concentrated on the threshold selection process. This paper focuses 
on the effect that different wavelet bases have on cross-validation based threshold 
selection, and the error in the final result. This paper also suggests how prior 
information may be incorporated into the basis selection process, and the effects 
of choosing a wrong prior. Both orthogonal and biorthogonal wavelet bases were 
explored. 

Wavelet regression is performed in three steps. The first step is to apply a discrete 
wavelet transform to the sampled data to produce decomposition coefficients. Next 
a threshold is applied to the coefficients. Then an inverse discrete wavelet transform 
is applied to these modified coefficients. 


The basis selection procedure is demonstrated to preform better than other wavelet 
regression methods even when the wrong prior on the space of the basis selections 
is specified. 

This paper is broken into the following sections. The background section gives a 
brief summary of the mathematical requirements of the discrete wavelet transform. 
This section is followed by a methodology section which outlines the basis selection 
algorithms, and the process for obtaining the presented results. This is followed by 
a results section and then a conclusion. 

2 BACKGROUND 

2.1 DISCRETE WAVELET TRANSFORM 

The Discrete Wavelet Transform (DWT) [Daubechies, 92] is implemented as a se- 
ries of projections onto scaling functions in L 2 (R). The initial assumption is that 
the original data samples lie in the finest space V 0 spanned by the scaling function 
4> € Vo such that the collection {<f>(x - /) | / € Z} is a Riesz basis of Vo- The first 
level of the dyadic decomposition then consists of projecting the data samples onto 
scaling functions which have been dilated to be t wice as wide as the orignal 4> which 
span the coarser space V - 1 : {<fi(2x — 2 1) | / G Zj. The information that is lost going 
from the finer to coarser scale is retained in what is known as wavelet coefficients. 
Instead of differencing, the wavelet coefficients can be obtained via a projection 
operation onto the wavelet basis functions which span a space known as Ho. 
The projections are typically implemented using Quadrature Mirror Finite Impulse 
Response Filters (QMFIR). The next level of decomposition is obtained by again 
doubling the scaling functions and projecting the first scaling decomposition coeffi- 
cients onto these functions. The difference in information between these two levels 
is contained in the wavelet coefficients for this level. In general, the scaling functions 
for level j and translation m may be represented by: <f>j l {t) = 2^ </>( 2~H — m) 
where t € [0, 2 k — 1], h > 1, 1 < j < k, 0 < m < 2 fc-J — 1. 

2.1.1 Orthogonal 

An orthogonal wavelet decomposition is defined such that the difference space Wj 
is the orthogonal complement of Vj in V }+ 1 : I Vo _L Vo which means that the 

projection of the wavelet functions onto the scaling functions on a level is zero: 

<<M(--J)> = 0, lez 

This results in the wavelet spaces Wj with j E Z being all mutually orthogo- 
nal. The refinement relations for an orthogonal decomposition may be written as: 
<t>{x) - ~ k) and 2^y/.^(2x - *r). 

2.1.2 Biorthogonal 

Symmetry is as an important property when considering using the scaling func- 
tions as interpolator functions. Most commonly used interpolator functions are 
symmetric. 

Daubechies [Daubechies, 92] mentions that it is well known in the subband filtering 
community that symmetry and exact reconstruction are incompatible if the same 



FIR filters are used for reconstruction and decomposition, except for the Haar fil- 
ter. If we are willing to use different filters for the analysis and synthesis banks, 
then symmetry and exact reconstruction are possible. Biorthogonal wavelets have 
a dual scaling function (f> and a dual wavelet ip. These generate a dual multireso- 
lution analysis with subspaces Vj and Wj so that:Vj ± Wj and Vj J_ Wj and the 
orthogonality conditions can now be written as: 

= o 

(<l>j,l,<Pk,m) = — — m fet l, '111, j,k E Z 

= Sj-kiSi-m for k E Z 

The refinement relations for biorthogonal wavelets can be written: 

<p(x) = 2 ^ hk<p{ 2x - k) and ip(x) = 2 g k <p{ 2x k) 
k k 

4>(x) - 2 ^ Jik4>(2x - k) and 4>(x) = 2^ g k <j>(2x - k) 

k k 

Basically this means that the scaling functions at one level are composed of linear 
combinations of scaling functions are the next finer level. The wavelet functions at 
one level are also composed of linear combinations of the scaling functions at the 
next finer level. 

2.2 LIFTING AND SECOND GENERATION WAVELETS 

Swelden’s lifting scheme [Sweldens, 95a] is a way to transform a biorthogonal wavelet 
decomposition obtained from low order filters to one that could be obtained from 
higher order filters (more FIR filter coefficients) without applying the longer filters 
and thus saving computations. This method can be used to increase the number 
of vanishing moments of the wavelet, or change the shape of the wavelet. This 
means that several different filters (i.e. sets of basis functions) may be applied with 
properties relevant to the problem domain in a manner more efficient than directly 
applying the filters individually. This is beneficial to performing the search over the 
space of admissible basis functions meeting the problem domain requirements. 

Swelden’s Second General Wavelets [Sweldens, 95b] are a result of applying lifting 
to simple interpolating biorthogonal wavelets and redefining the refinement relation 
of the dual wavelet to be: 

tl’(x) = 4>{2x - 1) - a k <t>(x - k) 

k 

where the a k are the lifting parameters. The lifting parameters may be selected to 
achieve desired properties in the basis functions relative to the problem domain. 

Prior information for a particular application domain may now be incorporated into 
the basis selection for wavelet regression. For example, if a particular application 
requires that there be a certain degree of smoothness (or a certain number of van- 
ishing moments) then only those lifting parameters which result in a number of 
vanishing moments within this range are used. This could formally be specified as 
specifying a distribution about those hyper- parameters used in the kernel of the 
probability density function of admissible lifting functions. 



2.3 THRESHOLD SELECTION 


Since the wavelet transform is a linear operator, if sampled data has noise, then 
the decomposition coefficients will have the same form of noise. The idea behind 
wavelet regression is that the decomposition coefficients that have a small magni- 
tude are substantially representative of the noise component of the sampled data. 
A threshold is selected and then all coefficients which are below the threshold in 
magntiude are either set to zero (a hard threshold) or a moved towards zero (a soft 
threshold). The soft threshold 7] t {y) = sgn(y)(| y | - 1 ) is used in this study. 

There are two basic methods of threshold selection: 1. Donoho s [Donoho, 95] 
analytic method which relies on knowledge of the noise distribution (such as a 
Gaussian noise source with a certain variance); 2. a cross-validation approach (many 
of which are reviewed in [Nason, 96]). It is beyond the scope of this paper to review 
these methods. Leave-one-out cross-validation was used in this study. 

3 METHODOLOGY 

The test functions used in this study arc the four functions published by Donoho 
and Johnstone [Donoho and Johnstone, 94]. These functions have been adopted 
by the wavelet regression community to aid in comparison of algorithms across 
publications. 

Each function was uniformly sampled to contain 2048 points Gaussian white noise 
was added so that the signal to noise ratio (SNR) was 7.0. Fifty replicates of each 
noisy function were created, of which four examples are depicted in Figure 1. 

The noise removal process involves three steps. The first step is to perform a discrete 
wavelet transform using a paticular basis. A threshold was selected for the resulting 
decomposition coefficients using leave-one-out cross validation with padding. 

The soft threshold is then applied to the decomposition. Next the inverse wavelet 
transform is applied to obtain a denoised version of the original signal. 

3.1 WAVELET BASIS SELECTION 

To demonstrate the effect of basis selection on the threshold found and the resulting 
recovered signal the following experiment was conducted. Two well studied ort hog- 
onal wavelet families were used: Dauhechies most compactly supported (DMCS), 
and Symlets (S) [Daubechies, 92]. For the DMCS family, filters of order 1 (which 
corresponds to the Haar wavelet) to 7 w'ere used. For the Symlets, filters of order 2 
through 7 w T ere used. For each filter, leave-one-out cross-validation was used to find 
a threshold which minimized the mean square error for each of the 50 replicates 
for the four test functions. The median threshold found for each case is presented. 
This median threshold is then applied to the decomposition of each of the replicates 
for each test function. The resulting reconstructed signals are then compared to 
the ideal function (the original before noise was added) and the Normalized Root 
Mean Square Error (NRMSE) is presented. 

The number of points in the functions were varied from 2048 to 128 to demonstrate 
the sensitivity of the method to sample size. 



Table 1 : Effects of Basis Selection 


Function 

Filter 

Order 

Family 

Median 
Thr. (MT) 

NRMSE 
Using MT 

Median 
True Thr. 

NRMSE 
using MTT 

Blocks 

1 

Daubechies 

1.33 

0.038 

1.61 

0.036 

Blocks 

2 

Symmlets 

1.245 

0.045 

1.40 

0.045 

Bumps 

4 

Daubechies 

1.11 

0.059 

1.47 

0.056 

Bumps 

5 

Symmlets 

1.13 

0.058 

1.48 

0.055 

Doppler 

8 

Daubechies 

1.27 

0.058 

1.65 

0.054 

Doppler 

8 

Symmlets 

1.36 

0.054 

1.74 

0.050 

Heavysin 

2 

Daubechies 

1.97 

0.039 

2.17 

0.038 

Heavysin 

5 

Symmlets 

1.985 

0.039 

2.16 

0.038 


3.2 INCORPORATING PRIOR INFORMATION: LIFTING 
PARAMETERS 

If the function that we are sampling is known to have certain smoothness proper- 
ties, then a distribution of the admissible lifting coefficients representing a similar 
smoothness characteristic can be formed. However it is not necessary to cautiously 
pick a prior. The performance of this method with a piecewise linear prior (the 
(2,2) biorthogonal wavelet of Cohen-Daubechies-Feauveau [Cohen, 92]) has been 
applied to the non-linear smooth test functions. This method has been compared 
with several standard techniques. The Smoothing Spline method (SS) [Wahba, 90], 
Donoho’s Sure Shrink method (SureShrink)[Donoho, 95], and an optimized Radial 
Basis Function Neural Network (RBFNN). 

4 RESULTS 

In the first experiment the procedure was only allowed to select between two well 
know bases (Daubechies most compactly supported and symmlet wavelets) with the 
desired filter order. Table 1 shows the filter order resulting in lowest NRMSE for 
each filter and function. As expected the best basis for the noisy blocks function 
was the piecewise linear basis (Daubechies, order 1). The doppler, which has very 
high frequency components required the highest, filter order. 

The method presented, as well as many wavelet regression schemes, has the flaw that 
a large number of samples is required. The median threshold, minimum threshold, 
and maximum threshold found for the heavysin function for various sample sizes is 
shown in Table 2. The deviation in the selected threshold increases as the sample 
size decreases and the thresholds found a typically too small to adequately remove 
the noise. However this is a very common problem in many wavelet regression 
methods. 

The basis selection procedure (labelled CV-Wavelets in Table 3) was compared 
with Donoho’s SureShrink, Wahba’s Smoothing Splines, and an optimized RBFNN. 
The prior information specified incorrectly to the procedure to prefer bases near 
piecewise linear. The remarkable observation is that the method did better than 
the others as measured by Mean Square Error. 





Noisy Blocks Function Noisy Bumps Function 


Noisy Heavysin function Noisy Doppler function 

Figure 1: Noisy Test Functions 










Recovered Blocks Function 



Recovered Heavysin function 


Recovered Bumps Function 



Recovered Doppler function 


Figure 2: Recovered Functions 







Table 2: Sample Size Comparison Table 


Size 

Median 

Thr. 

Mi:i 

Thr. 

Max 

Thr. 

2048 

1.97 

1.65 

2.33 

1024 

1.75 

1.39 

2.28 

512 

1.61 

1.12 

2.87 

256 

1.66 

0.90 

2.19 

128 

1.88 

0.99 

2.28 


Table 3: Methods Comparison Table of MSE 


Function 

ss 

SureShrink 

RBFNN 

CV- Wavelets 

Blocks 

0.546 

0.398 

1.281 

0.362 

Heavysin 

0.075 

0.062 

0.113 

0.051 

Doppler 

0.205 

0.145 

0.287 

0.116 
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